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Abstract 

A  method  for  deformable  shape  detection  and  recognition 
is  described.  Deformable  shape  templates  are  used  to  par¬ 
tition  the  image  into  a  globally  consistent  interpretation, 
determined  in  part  by  the  minimum  description  length  prin¬ 
ciple.  Statistical  shape  models  enforce  the  prior  probabil¬ 
ities  on  global,  parametric  deformations  for  each  object 
class.  Once  trained,  the  system  autonomously  segments 
deformed  shapes  from  the  background,  while  not  merging 
them  with  adjacent  objects  or  shadows.  The  formulation 
can  be  used  to  group  image  regions  based  on  any  image  ho¬ 
mogeneity  predicate;  e.g.,  texture,  color,  or  motion.  The  re¬ 
covered  shape  models  can  be  used  directly  in  object  recog¬ 
nition.  Experiments  with  color  imagery  are  reported. 

1  Introduction 

Segmentation  using  traditional  low-level  image  processing 
techniques,  such  as  region  growing,  edge  detection,  and 
mathematical  morphology  operations,  requires  a  consider¬ 
able  amount  of  interactive  guidance  in  order  to  get  satis¬ 
factory  results.  Automating  these  model-free  approaches 
is  difficult  because  of  noise,  shape  complexity,  illumina¬ 
tion,  inter-reflection,  shadows,  and  variability  within  and 
across  individual  objects. 

One  can  exploit  prior  knowledge  to  sufficiently  con¬ 
strain  the  segmentation  problem.  When  available,  such 
information  can  be  used  to  eliminate  ambiguities  and  re¬ 
duce  computational  complexity  in  finding  optimal  group¬ 
ings  of  image  regions.  For  instance,  model-based  segmen¬ 
tation  can  be  used  in  concert  with  image  preprocessing  to 
guide  and  constrain  region  grouping  [13,  28,  35]. 

The  use  of  models  in  segmentation  is  not  a  panacea, 
however.  Due  to  shape  deformation  and  variation  within 
object  classes,  a  simple  rigid  model-based  approach  will 
break  down  in  general.  This  led  to  the  use  of  deformable 
shape  models  in  image  segmentation  [7, 18, 20, 22, 31, 38]. 

Another  strategy  is  to  utilize  image  features  that  are 
somewhat  invariant  to  illumination  [6,  16],  or  to  directly 
model  the  physics  of  illumination,  color,  shadows,  and  sur- 
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face  inter-reflections  [14,  23].  Such  approaches  have  been 
shown  to  improve  segmentation  accuracy,  and  could  be 
combined  with  model  based  methods. 

The  above  mentioned  techniques  make  mistakes  in 
merging  regions,  even  in  constrained  contexts,  because  lo¬ 
cal  constraints  are  in  general  insufficient.  For  more  reliable 
segmentation,  global  consistency  must  be  enforced.  This 
idea  is  embodied  in  the  principle  of  global  coherence  [33]: 
the  best  partitioning  is  the  one  that  globally  and  consis¬ 
tently  explains  the  greatest  portion  of  the  sensed  data.  Ide¬ 
ally,  this  should  be  coupled  with  the  minimum  discription 
length  (MDL)  principle:  the  simplest  region  segmentation 
explaining  the  observations  is  the  best  [11, 21,  24,  39]. 

Finding  the  globally  consistent,  MDL  image  labeling  is 
impractical  in  general  due  to  the  computational  complexity 
of  global  optimization  algorithms.  This  has  led  to  die  use 
of  parallel  algorithms  [11, 24]  or  approximation  algorithms 
[5,8,15,21,29,32,37,39], 

2  Overview  of  Approach 

The  above  mentioned  work  leads  to  the  development  of  our 
approach.  Deformable  shape  templates  are  used  to  parti¬ 
tion  the  image  into  a  globally  consistent  interpretation,  de¬ 
termined  in  part  by  the  MDL  principle.  The  formulation 
can  be  used  to  group  image  regions  based  on  any  image 
homogeneity  predicate;  e.g.,  texture,  color,  or  motion. 

Each  shape  template  is  specified  in  terms  of  global  warp¬ 
ing  functions  applied  to  a  closed  polygon.  In  the  imple¬ 
mentation,  the  prior  distribution  on  global  deformations 
for  each  shape  is  assumed  Gaussian,  and  estimated  using 
region  segmentations  provided  in  a  training  set.  In  our  ex¬ 
periments,  approximately  40  training  images  are  needed 
to  train  a  model.  Once  trained,  the  system  autonomously 
segments  deformed  shapes  from  the  background,  while  not 
merging  them  with  adjacent  objects  or  shadows. 

We  will  now  give  a  brief  overview  of  the  segmentation 
process  as  it  is  applied  to  find  four  bananas  in  the  example 
image  of  Fig.  1(a).  First,  the  input  color  image  is  over¬ 
segmented  via  standard  region-merging  algorithms  [2,  9], 
as  shown  in  Fig.  1(b).  Using  this  over-segmentation,  can¬ 
didate  regions  for  interesting  objects  are  determined  based 
on  their  color  features  [6]. 

Next  an  edge  map  is  computed  for  the  input  image,  as 
shown  in  Fig.  1(c).  The  edge  map  is  used  to  constrain  con¬ 
sideration  of  possible  grouping  hypotheses  later  in  region 
merging.  Notable  edges  and  their  strengths  can  be  detected 
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Figure  1 :  Example  input  and  precomputation:  (a)  input  image, 
(b)  over  segmentation,  (c)  edge  map,  (d)  deformable  template. 


Figure  2:  Result:  (a)  selected  region  groupings,  (b)  model- 
guided  region  merging,  (c)  recovered  parametric  shape  models. 


via  standard  image  processing  methods. 

The  system  then  tests  various  combinations  of  candi¬ 
date  region  groupings.  For  each  grouping  hypothesis,  we 
recover  the  model  alignment  and  deformations  needed  to 
match  the  grouping.  Fig.  1(d)  shows  the  template  used  for 
grouping  regions  in  this  example.  Goodness  of  fit  is  deter¬ 
mined  by  a  cost  measure  that  includes:  1.)  a  region  color 
compatibility  term,  2.)  a  region/model  area  overlap  term, 
and  3.)  a  deformation  term.  The  third  term  enforces  a  pri¬ 
ori  constraints  on  the  allowable  deformations  for  a  partic¬ 
ular  deformable  shape  class  ( e.g .,  bananas).  The  template 
“prefers”  to  deform  in  ways  that  are  consistent  with  the 
prior  distribution  on  the  deformation  parameters. 

In  theory,  the  system  should  exhaustively  test  all  possi¬ 
ble  combinations  of  regions  groupings,  and  select  the  best 
ones  for  merging.  In  practice,  region  adjacency  and  edge 
map  constraints  are  used  to  prune  search.  Despite  this,  the 
worst  case  computational  complexity  remains  exponential. 
To  make  the  problem  tractable,  we  employ  algorithms  that 
find  the  approximately  optimal  solution:  best-first,  simu¬ 
lated  annealing,  or  highest  confidence  first. 

The  approximately  optimal  region  groupings  obtained 
via  the  best-first  algorithm  are  shown  in  Fig.  2(a).  These 
groupings  can  then  be  merged  in  the  color  image  segmen¬ 
tation,  as  shown  in  Fig.  2(b).  Note  that  region  merging 


and  object  identification  are  executed  simultaneously.  The 
system  simultaneously  recovers  a  deformable  template  de¬ 
scription  for  each  region  grouping  as  shown  in  Fig.  2(c). 
Recovered  template  parameters  can  be  used  in  estimating 
the  likelihood  that  a  shape  belongs  to  a  particular  class. 

3  Related  Work 

Previous  approaches  are  based  on  the  active  contours 
paradigm  [22].  The  snake  formulation  can  be  extended 
to  include  a  term  that  enforces  homogeneous  properties 
over  the  region  during  region  growing  [7,  18,  20,  31,  38]. 
This  hybrid  approach  offers  the  advantages  of  both  region- 
based  and  deformable  modeling  techniques,  and  tends  to 
be  more  robust  with  respect  to  model  initialization  and 
noisy  data.  However,  it  requires  hand-placement  of  the  ini¬ 
tial  model,  or  a  user-specified  seed  point  on  the  interior  of 
the  region.  One  proposed  solution  is  to  scatter  many  region 
seeds  at  random  over  the  image,  followed  with  segmenta¬ 
tion  guided  via  Bayes/MDL  criteria  [1 1,  39]. 

Other  approaches  use  special-purpose  deformable  tem¬ 
plates  [19,  26,  38];  e.g.,  to  model  facial  features,  such  as 
eyes  [38].  The  template-based  approach  allows  for  inclu¬ 
sion  of  object-specific  knowledge  in  the  model.  This  fur¬ 
ther  constrains  segmentation,  resulting  in  enhanced  robust¬ 
ness  to  occlusion  and  noise.  Under  certain  conditions,  de¬ 
formable  templates  can  be  derived  semi- automatically,  via 
statistical  analysis  of  shape  training  data  [10,  27].  The  es¬ 
timated  probability  density  function  (PDF)  for  the  shape 
deformation  parameters  can  be  used  in  ML-estimation  of 
segmentation  and  in  Bayesian  recognition  methods. 

From  another  view,  image  segmentation  is  a  labeling 
problem;  the  ideal  segmentation  should  be  globally  consis¬ 
tent  or  nearest  to  the  one  with  maximum  likelihood.  This 
has  led  to  various  relaxation  labeling  or  stochastic  labeling 
methods  that  are  related  to  general  optimization  algorithms 
[3, 17, 12].  Nearly  all  require  some  prior  information,  such 
as  the  number  of  labels  needed  or  the  probability  distribu¬ 
tion  of  labels  in  the  image.  Such  information  is  not  always 
available  for  general  imagery. 

After  defining  the  criterion  function  for  labeling,  the 
next  problem  is  computing  the  solution  to  the  optimiza¬ 
tion  problem.  A  number  of  proposed  approaches  employ 
simulated  or  deterministic  annealing  [5, 32, 15, 29, 37]  (for 
a  comparison  see  [25]).  Chou  and  Brown  [8]  used  highest 
confidence  first  (HCF)  to  infer  a  unique  labeling  from  the 
posteriori  distribution  that  is  consistent  with  both  the  prior 
knowledge  and  evidence.  Their  method  is  analgous  to  de¬ 
terministic  annealing,  but  computation  is  more  efficient. 

A  number  of  authors  have  proposed  a  formulation  of  the 
image  partitioning  problem  that  is  based  on  the  minimum 
description  length  (MDL)  principle  [11,  21,  24,  39].  MDL 
is  based  on  information-theoretic  arguments:  the  simplest 
model  explaining  the  observations  is  the  best.  It  also  re¬ 
sults  in  an  objective  function  with  no  arbitrary  thresholds. 
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As  will  be  seen,  the  global  cost  function  employed  in  our 
system  is  compatible  with  the  MDL  principle. 

4  Deformable  Model  Formulation 

In  our  system,  a  deformable  model  is  used  to  guide  group¬ 
ing  of  image  regions.  Shape  is  specified  in  terms  of  global 
warping  functions  applied  to  a  closed  polygon,  also  known 
as  a  template.  The  global  warping  can  be  generic,  and 
is  determined  by  a  vector  of  warping  coefficients,  a.  To 
demonstrate  the  approach,  we  implemented  a  system  that 
uses  quadratic  polynomials  to  model  global  deformation 
due  to  scaling,  shearing,  bending,  and  tapering. 

In  a  traditional  active  contours  formulation,  smoothness 
and  bending  operators  are  defined  over  the  control  points  of 
the  model  to  obtain  a  stiffness  matrix,  K.  In  a  deformable 
template  formulation,  we  instead  define  a  stiffness  matrix 
over  the  deformation  parameters .  The  strain  energy  is  thus 
expressed  in  the  template’s  deformation  parameter  space: 

strain  =  a  I^a  (1) 

where  a  =  a  -  a  is  a  vector  describing  parameter  displace¬ 
ment  from  a  zero  strain  “rest”  state. 

There  is  a  well  understood  link  between  active  models 
and  statistical  estimation  [10,  27,  36,  34],  Let  us  assume 
that  the  distribution  on  deformation  parameters  for  a  partic¬ 
ular  shape  category  can  be  modeled  as  a  multi-dimensional 
normal  distribution.  The  distribution  is  characterized  by  its 
mean  a  and  covariance  matrix  £.  For  a  given  deformation 
parameter  vector  a,  the  sufficient  statistic  for  characteriz¬ 
ing  likelihood  is  the  Mahalanobis  distance: 

E deform  —  a  S  a,  (2) 

where  a  =  a  —  a.  Thus  inverse  covariance  is  essentially 
a  “statistical  stiffness  matrix.”  As  will  be  described,  a,  £ 
are  acquired  via  supervised  learning. 

An  eigenvector  transform  is  used  to  precondition  prob¬ 
lem  by  diagonalizing  (decoupling)  the  stiffness  matrix 
[30,  10],  This  reduces  the  computational  complexity  of 
evaluating  Eqs.  1  and  2  and  improves  the  model's  robust¬ 
ness  to  noise.  During  model  fitting,  deformations  are  re¬ 
covered  in  the  decoupled  parameter  space. 

4.1  Model  Fitting 

One  important  step  in  the  image  partitioning  procedure 
is  to  fit  each  region  grouping  hypothesis  with  deformable 
models  from  the  object  library.  During  segmentation,  the 
shape  model  is  deformed  to  match  each  grouping  hypothe¬ 
sis  g i  in  such  a  way  as  to  minimize  a  cost  function: 

•E'(Si)  =  Ej color  “b  &Earea  +  0Edef  orm ?  (3) 

where  a  and  f3  are  scalars  that  control  the  importance  of  the 
three  terms.  The  color  compatibility  term  Ecoior  is  simply 
the  norm  of  color  covariance  matrix  for  pixels  within  the 


current  region  grouping.  The  region/model  area  overlap 
term  is  computed  via  Earea  —  }  where  Sq  is  the  area 

of  the  region  grouping  hypothesis,  Sm  is  the  area  of  the  de¬ 
formed  model,  and  Sc  is  the  common  area  between  the  re¬ 
gions  and  deformed  model.  By  using  the  degree  of  overlap 
in  our  cost  measure,  we  can  avoid  the  problem  of  finding 
direct  correspondence  between  landmark  points,  which  is 
not  easy  in  the  presence  of  large  deformations. 

Various  approaches  to  minimizing  such  a  cost  func¬ 
tion  have  been  suggested  in  the  literature:  graduated  non¬ 
convexity  [4],  multi-grid  approaches  [36],  and  nonlinear 
programming  methods  [1].  In  our  system,  we  employ  the 
downhill-simplex  method  because  it  requires  only  function 
evaluations,  not  derivatives.  Though  it  is  not  very  efficient 
in  terms  of  the  number  of  function  evaluations  that  it  re¬ 
quires,  it  is  still  suitable  for  our  application  since  it  is  fully  - 
automatic,  and  reliable.  Due  to  space  limitations,  readers 
are  referred  to  [25]  for  implementation  details.  The  proce¬ 
dure  is  accelerated  via  a  multiscale  approach. 

4.2  Model  Training 

In  our  current  system,  the  template  is  defined  by  the  oper¬ 
ator  as  a  polygonal  model.  During  model  training,  a  col¬ 
lection  of  training  images  are  first  over-segmented  as  de¬ 
scribed  in  the  previous  section.  For  each  over-segmented 
image,  a  human  operator  is  asked  to  mark  candidate  re¬ 
gions  that  belong  to  the  same  object.  The  system  then  uses 
downhill-simplex  method  to  minimize  the  cost  function  in 
Eq.  3,  thereby  matching  the  template  to  the  training  regions 
in  a  particular  image.  This  process  is  repeated  for  all  im¬ 
ages  in  the  training  set.  As  more  training  data  is  processed, 
the  system  can  then  semi-automate  training.  The  system 
can  take  a  “first  guess”  at  the  correct  region  grouping  and 
present  it  to  the  operator  for  approval  [25]. 

5  Automatic  Image  Segmentation 

Once  trained,  the  deformable  model  guides  the  grouping 
and  merging  of  color  regions.  The  process  begins  with 
over-segmentation  of  the  input  image.  An  edge  map  is  also 
computed  via  standard  image  processing  methods.  Using 
this  over-segmentation,  candidate  region  groupings  are  de¬ 
termined  based  on  the  color  band-rate  feature  [6]. 

Two  major  constraints  are  used  in  the  selection  of  candi¬ 
date  groupings.  The  first  constraint  is  a  spatial  constraint: 
every  region  in  a  grouping  hypothesis  should  be  adjacent 
to  another  region  in  the  same  group.  The  second  constraint 
is  a  region  boundary  compatibility  constraint:  if  the  aver¬ 
age  edge  strength  along  the  boundary  between  two  region 
exceeds  a  threshold,  then  the  pair  of  images  are  marked 
as  incompatible.  Finally,  the  number  of  candidate  group¬ 
ings  can  be  further  reduced  by  considering  only  those  that 
include  at  least  one  region  with  relatively  large  area. 

Local  constraints  are  insufficient  for  obtaining  reliable 
segmentation.  To  gain  more  reliable  segmentation,  global 
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consistency  must  be  enforced  [33].  In  the  global  consis¬ 
tency  strategy,  for  any  possible  partitioning  of  the  image, 
we  compute  a  global  cost  for  the  whole  configuration: 

n 

£  =  ^2nE(gi)  +  7n,  (4) 

i~l 

where  7  is  a  scalar,  n  is  the  number  of  the  groupings  in 
the  current  image  partitioning,  77  is  the  ratio  of  ith  group 
area  to  the  total  area,  and  E(gi)  is  the  cost  function  for  the 
group  g i  (Eq.  3).  In  our  experiments,  7  =  0.04. 

The  first  term  in  Eq.  4  is  the  sum  of  the  model  compati- 
bilty  for  every  grouping  in  the  image  partition.  The  second 
term  corresponds  to  the  code  length  (number  of  models 
employed),  and  thereby  enforces  a  minimum  description 
length  criterion,  along  the  lines  of  [24]. 

5.1  Approximating  the  Optimal  Solution 

Eq.  4  does  not  exhibit  the  optimal  substructure  property 
required  for  solution  via  dynamic  programming  methods 
[25].  Furthermore,  after  the  initial  segmentation,  the  num¬ 
ber  of  candidate  regions  is  not  small  in  general.  We  there¬ 
fore  implemented  a  number  of  approximation  algorithms. 
Such  algorithms  tend  to  find  a  near-optimal  partition  within 
a  reasonable  number  of  steps. 

One  such  algorithm,  best-first,  is  greedy.  It  examines 
only  the  local  cost  of  merging  (Eq.  3)  at  each  step.  First,  a 
list  of  all  possible  grouping  hypotheses  is  generated  as  de¬ 
scribed  above.  Once  all  grouping  hypotheses  have  been  fit¬ 
ted  with  shape  models,  we  then  compare  the  merging  cost 
of  different  grouping  hypotheses,  selecting  the  hypothesis 
with  minimum  model  cost.  If  the  cost  is  less  than  a  thresh¬ 
old,  then  the  regions  are  merged.  Any  hypotheses  that  in¬ 
clude  these  merged  regions  are  then  eliminated  from  fur¬ 
ther  consideration.  If  any  unmerged  grouping  hypotheses 
remain,  then  we  select  the  one  with  the  minimum  cost  and 
repeat  the  procedure.  If  the  cost  exceeds  the  threshold  or 
the  hypothesis  list  is  empty,  then  the  procedure  stops. 

If  the  number  of  candidate  regions  in  the  over¬ 
segmented  image  is  very  large,  the  best-first  strategy  tends 
to  be  inefficient;  it  sometimes  requires  hours  to  segment 
an  image  on  a  standard  workstation  (SGI  R5K  Indy).  This 
led  us  to  explore  approaches  that  approximately  optimize 
global  cost  (Eq.  4).  Due  to  space  limitations,  readers  are 
directed  to  [25]  for  pseudocode  and  details  of  a  simulated 
annealing  solution.  In  our  experiments,  the  convergence  of 
the  simulated  annealing  algorithm,  while  markedly  better 
than  best-first,  is  still  slow.  There  is  an  inherent  tradeoff 
between  annealing  schedule  and  correctness  of  result. 

5.2  Highest  Confidence  First  Algorithm 

A  deterministic  algorithm,  highest  confidence  first  (HCF), 
can  be  used  to  improve  convergence  speed  [8,  21].  The 
HCF  algorithm  as  applied  to  our  problem  is  as  follows: 


1 .  Initialize  the  region  grouping  configuration  such  that  ev¬ 
ery  region  in  the  over-segmented  image  is  in  its  own  dis¬ 
tinct  group  g  i. 

2.  Fit  models  to  each  region  grouping  g*.  Compute  the 
global  cost  £0  via  Eq.  4.  Save  this  configuration  as  best 
found  so  far,  CQ. 

3.  Set  £m  to  a  very  large  value. 

4.  For  each  pair  of  adjacent  groups  g;,gj  in  the  current 
configuration,  compute  the  global  cost,  £2  that  would 
result  if  g*,  g j  were  merged.  If  £2  <  then  set  £m  = 
£2  and  save  this  merged  configuration  Cm.  After  this 
step,  Cm  is  the  configuration  with  minimum  merging 
cost  for  any  pair  of  groups  in  the  current  configuration. 

5.  Use  the  merged  configuration  Cm  as  the  new  configura¬ 
tion.  If  £m  <  £0,  then  set  £0  =  £m  and  save  this  new 
configuration  as  best  found  so  far  C0  =  Cm . 

6.  Terminate  when  all  groups  are  merged  into  one.  and 
output  the  best  configuration  C0  and  its  cost  value  £0. 
Otherwise,  go  to  3. 

In  our  experience,  the  computational  complexity  of  HCF 
is  generally  less  than  that  needed  to  obtain  similar  qual¬ 
ity  segmentation  results  via  the  simulated  annealing  algo¬ 
rithm  [25].  In  each  HCF  iteration,  the  number  of  different 
merging  configurations  tested  is  about  O(rz),  where  n  is 
the  number  of  regions  in  the  over-segmented  image.  This 
is  because  some  results  from  the  previous  iteration  can  be 
reused  in  the  next.  At  each  iteration  (except  the  first),  the 
algorithm  need  only  compute  the  pairwise  merging  cost  be¬ 
tween  all  groups  g i  and  the  newly-merged  group  from  the 
previous  iteration.  Thus  the  total  complexity  is  0(n2). 

6  Examples 

The  system  has  been  tested  on  hundreds  of  images  from 
a  number  of  different  classes  of  cluttered  color  imagery: 
images  of  fruit,  vegetables,  and  leaves  collected  under  con¬ 
trolled  lab  conditions,  and  images  of  fish  obtained  from  the 
world  wide  web.  A  few  examples  are  now  shown. 

The  first  example  shows  results  for  detecting  and  merg¬ 
ing  regions  associated  with  bananas.  The  shape  template 
(Fig.  1(d))  was  trained  using  40  example  images  of  ba¬ 
nanas  at  varying  orientations  and  scales.  The  training  im¬ 
ages  were  excluded  from  the  test  image  data  set.  All  im¬ 
ages  in  the  test  data  set  were  then  segmented  using  the 
trained  model,  as  described  in  Sec.  5.  The  best  first  strategy 
was  employed  in  finding  the  best  image  partition. 

The  resulting  model-based  region  groupings  are  shown 
below  each  of  the  original  images  in  Fig.  3.  In  cases  where 
there  were  multiple  yellow  objects  in  the  image,  the  sys¬ 
tem  recovered  multiple  model-based  groupings  (shown  in 
different  colors).  Segmentation  took  between  30  and  180 
sec.  per  image  on  an  SGI  R5K  Indy  workstation. 

The  system  correctly  grouped  regions  despite  shadows, 
variation  in  illuminant,  and  shape  deformation.  Especially 
notable  are  cases  where  multiple  yellow  shapes  abut  each 
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Figure  3:  Image  segmentation  example:  color  images  of  ba¬ 
nanas  in  various  positions  with  varying  illumination.  The  result¬ 
ing  model-based  region  groupings  are  shown  below  each  color  in¬ 
put  image.  If  an  image  contained  more  than  one  detected  shape, 
the  shape  that  the  system  recognized  as  most  “banana  like”  in 
each  image  is  labeled  in  light  gray.  Note  that  the  most  similar 
shapes  are  other  bent  bananas  of  similar  aspect  ratio. 


other.  Due  to  the  use  of  model-based  region  merging,  the 
system  is  able  to  avoid  merging  similarly  colored,  adjacent 
but  separate  objects.  The  approach  is  also  adept  at  avoiding 
merging  objects  with  their  similarly-colored  shadows. 

As  explained  in  Sec.  4,  each  region  grouping  has  an  as¬ 
sociated  vector  of  shape  deformation  parameters  a.  The 
vector  provides  a  low- dimensional  description  of  each 
shape  that  can  be  stored  and  used  for  recognition.  In  cases 


Figure  5:  Leaf  image  segmentation  examples.  Each  row  of  the 
figure  shows  one  example.  Original  images  are  shown  in  the  first 
column,  followed  by  over-segmented  images  used  as  input  to  the 
merging  algorithm.  The  third  image  in  each  row  shows  the  best 
model  configuration  obtained  via  HCF.  The  model-based  region 
merging  result  is  shown  as  the  final  image  in  each  row. 


where  multiple  objects  are  present,  the  system  stores  a  list 
of  model  descriptions  for  that  image. 

Preliminary  experiments  in  using  the  recovered  shape 
parameter  vectors  for  object  recognition  have  been  con¬ 
ducted.  An  example  is  shown  in  Fig.  3.  The  “target”  shape 
was  the  banana  in  the  first  image  (upper  left).  The  subse¬ 
quent  images  are  shown  in  similarity  ranking,  left  to  right, 
top  to  bottom.  Similarity  was  determined  via  Mahalonobis 
distance  between  recovered  a  vectors.  The  most  similar 
shape  in  each  image  is  shown  highlighed  in  lighter  gray 
in  the  labelled  image  below.  The  most  similar  shapes  are 
other  bent  bananas  of  similar  aspect  ratio. 

The  next  example  makes  use  of  the  global  consistency 


strategy  to  obtain  segmentation  of  tropical  leaf  images. 
This  example  can  be  characterized  by  clutter  of  many  sim¬ 
ple  leaves.  The  leaf  model  employed  in  this  example  was 
approximately  an  oval,  as  is  shown  in  Fig.  4(a).  It  was  de¬ 
fined  and  trained  as  in  the  previous  example.  The  training 
images  were  not  contained  in  our  test  image  data  set.  The 
HCF  algorithm  was  used  in  finding  the  “best”  global  con¬ 
figuration,  as  described  in  Sec.  5.2. 

The  method  was  tested  on  a  collection  of  over  100  im¬ 
ages  of  different  tropical  leaves.  Due  to  space  limitations, 
not  all  results  can  be  shown  here.  Four  examples  are  shown 
in  Fig.  5.  Segmentation  took  between  30  and  360  secs,  per 
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Figure  6:  Example  segmentation  for  images  of  fish.  The  original  images  are  shown  in  the  first  column,  followed  by  the  over-segmented 
images  used  as  input  to  the  merging  algorithm.  The  third  column  shows  the  models  selected  in  the  best  merging  configuration  obtained 
via  HCF.  Finally,  last  column  depicts  the  model-based  merging. 


image  using  the  HCF  algorithm.  As  can  be  seen,  the  sys¬ 
tem  produces  a  satisfactory  segmentation  in  each  case,  de¬ 
spite  large  deformations.  Furthermore,  the  system  does  not 
merge  adjacent,  similarly  colored  regions  unless  they  were 
consistent  with  the  deformable  shape  model. 

The  final  example  shows  segmentation  results  for  five 
examples  of  fish  images  obtained  from  the  world  wide  web. 
These  images  are  particularly  challenging,  since  there  is 
greater  shape  and  color  variation,  large  deformation,  and 
clutter.  The  fish  model  used  in  segmentation  is  shown  in 
Fig.  4(b),  and  was  trained  using  about  60  training  images. 
The  test  images  were  excluded  from  the  training  set. 

As  shown  in  Fig.  6,  the  method  recovered  a  deformable 
model  description  of  each  fish  in  the  image.  In  one  case, 
(Fig.  6(a)),  the  orientation  of  the  model  was  incorrectly  es¬ 
timated  for  three  fish.  In  such  a  case,  local  features  might 
be  used  to  resolve  the  orientation  ambiguity.  Despite  clut¬ 
ter,  large  deformation,  shape  variation,  and  partial  occlu¬ 
sions,  the  other  fish  were  accurately  segmented. 

7  Discussion 

In  previous  approaches  to  deformable  template-based  seg¬ 
mentation,  initial  model  placement  is  either  given  by  the 
operator,  or  obtained  via  exhaustively  testing  the  model  in 


all  orientations,  scales,  and  deformations  centered  at  ev¬ 
ery  pixel  (or  at  random  seed  pixels).  The  region-based 
approach  proposed  in  this  paper  significantly  reduces  the 
need  to  test  all  model  positions. 

Issues  of  computational  complexity  were  addressed 
through  the  use  of  various  constraints  as  was  described  in 
Sec.  5,  and  the  use  of  multi-scale  fitting.  However,  the 
complexity  is  still  daunting  in  cluttered  imagery  and  needs 
to  be  improved.  The  major  issue  is  computation  time  re¬ 
quired  to  obtain  a  segmentation  result.  This  led  to  the  eval¬ 
uation  of  different  methods  for  obtaining  “optimal”  region 
groupings.  At  present,  the  method  is  well-suited  to  applica¬ 
tions  where  shape  segmentation  can  be  precomputed  (e.g., 
image  databases  indexing). 

If  there  are  shadows  or  partially  overlapping  objects  in 
the  image,  then  the  best-first  strategy  can  sometimes  get 
a  better  result  since  it  can  select  the  most  confident  group 
to  merge  first,  and  avoid  fitting  spurious  objects.  Unfortu¬ 
nately,  the  computational  complexity  of  best-first  strategy 
prohibits  application  in  general  imagery. 

Compared  with  the  best  first  strategy,  the  simulated  an¬ 
nealing  approach  offers  a  significant  reduction  in  compu¬ 
tational  complexity.  However,  the  degree  of  reduction  in 
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complexity  depends  on  the  annealing  schedule,  and  there 
is  a  trade-off  between  the  robustness  and  the  speed.  There¬ 
fore,  the  global  consistency  strategy  (via  HCF)  offers  a 
reasonable  compromise  between  speed  and  accuracy.  It  is 
therefore  the  preferred  method. 

The  method  is  able  to  obtain  a  satisfactory  segmentation 
despite  clutter,  variation  in  illuminant,  shape  deformation, 
etc.  Based  on  the  statistical  shape  model,  the  algorithm 
can  detect  the  whole  object  correctly,  while  at  the  same 
time,  avoid  merging  objects  with  background  and  shadow, 
or  merging  adjacent  multiple  objects.  Region  merging  and 
object  identification  are  executed  simultaneously.  Recov¬ 
ered  shape  parameters  can  be  used  directly  in  recognition. 
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